class: title-slide, left, bottom # Feedforward Neural Networks as Statistical Models ---- ## **Andrew McInerney**, **Kevin Burke** ### University of Limerick #### 3rd Young-ISA Event, 7 Oct 2022 --- # Feedforward Neural Networks .pull-left[ <img src="data:image/png;base64,#img/FNN.png" width="90%" style="display: block; margin: auto;" /> ] <br> <br> -- $$ `\begin{equation} \text{NN}(x_i) = \gamma_0+\sum_{k=1}^q \gamma_k \phi \left( \sum_{j=0}^p \omega_{jk}x_{ji}\right) \end{equation}` $$ --- # Data Application -- ### Boston Housing Data 506 communities in Boston, MA (James et al., 2022). -- Response: - `medv` (median value of owner-occupied homes) -- 12 Explanatory Variables: - `rm` (average number of rooms per dwelling) - `lstat` (proportion of population that are 'lower status') --- # R Implementation: nnet ```r library(nnet) nn <- nnet(medv ~ ., data = Boston, size = 8, maxit = 5000, linout = TRUE) summary(nn) ``` -- ``` ## a 11-8-1 network with 105 weights ## options were - linear output units ## b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1 i6->h1 i7->h1 i8->h1 i9->h1 ## 2.79 5.92 0.34 1.31 0.23 -1.31 -2.67 0.77 -0.22 1.46 ## i10->h1 i11->h1 ## 1.20 1.26 ## b->h2 i1->h2 i2->h2 i3->h2 i4->h2 i5->h2 i6->h2 i7->h2 i8->h2 i9->h2 ## 20.53 5.59 3.52 -0.64 12.64 -5.25 -4.12 0.24 2.64 0.49 ## i10->h2 i11->h2 ## -21.17 4.03 ## [...] ``` --- # Statistical Perspective -- $$ y_i = \text{NN}(x_i) + \varepsilon_i, $$ -- where $$ \varepsilon_i \sim N(0, \sigma^2) $$ <br> -- $$ \ell(\theta)= -\frac{n}{2}\log(2\pi\sigma^2)-\frac{1}{2\sigma^2}\sum_{i=1}^n(y_i-g(x_i))^2 $$ --- # Uncertainty Quantification Then, as `\(n \to \infty\)` $$ \hat{\theta} \sim N[\theta, \Sigma = \mathcal{I}(\theta)^{-1}] $$ -- Estimate `\(\Sigma\)` using $$ \hat{\Sigma} = I_o(\theta)^{-1} $$ -- <br> However, inverting `\(I_o(\theta)\)` be problematic. --- # Redundancy - Redundant hidden nodes can lead to issues of unidentifiability for some of the parameters (Fukumizu 1996). <br> -- - Redundant hidden nodes `\(\implies\)` Singular information matrix. <br> -- - Model selection is required. --- # Model Selection -- <img src="data:image/png;base64,#img/modelsel.png" width="90%" style="display: block; margin: auto;" /> -- A Statistically-Based Approach to Feedforward Neural Network Model Selection (arXiv:2207.04248) --- # Hypothesis Testing Wald test: -- $$ `\begin{equation} \omega_j = (\omega_{j1},\omega_{j2},\dotsc,\omega_{jq})^T \end{equation}` $$ -- $$ `\begin{equation} H_0: \omega_j = 0 \end{equation}` $$ -- $$ `\begin{equation} (\hat{\omega}_{j} - \omega_j)^T\Sigma_{\hat{\omega}_{j}}^{-1}(\hat{\omega}_{j} - \omega_j) \sim \chi^2_q \end{equation}` $$ -- Likelihood ratio test: $$ `\begin{equation} 2(\ell_1 - \ell_0) \sim \chi^2_q \end{equation}` $$ --- # Covariate-Effect Plots Propose covariate-effect plots of the following form: -- $$ `\begin{equation} \hat{\beta}_j(x) = \frac{1}{n}\sum_{i=1}^n \left[ g(x + \sigma_{j}, X \setminus x_{ij}) - g(x, X \setminus x_{ij}) \right] \end{equation}` $$ -- And their associated uncertainty: -- $$ `\begin{equation} \hat{\beta}_j(x) \sim N[\beta_j(x), \nabla_\theta^T \beta_j(x) ~ \Sigma ~ \nabla_\theta \beta_j(x)] \end{equation}` $$ --- # R Implementation .left-column[ <br> <img src="data:image/png;base64,#img/statnnet.png" width="80%" style="display: block; margin: auto;" /> ] -- .right-column[ <br> <br> ```r # install.packages("devtools") library(devtools) install_github("andrew-mcinerney/statnnet") ``` ] --- # Data Application (Revistied) ### Boston Housing Data 506 communities in Boston, MA (James et al., 2022) -- Response: - `medv` (median value of owner-occupied homes) -- 12 Explanatory Variables: - `rm` (average number of rooms per dwelling) - `lstat` (proportion of population that are 'lower status') --- # Boston Housing: Model Selection ```r library(statnnet) nn <- selectnn(medv ~ ., data = Boston, Q = 10, n_init = 10, maxit = 5000) summary(nn) ``` -- ``` ## Call: ## selectnn.formula(formula = medv ~ ., data = Boston, Q = 10, n_init = 10, ## maxit = 5000) ## ## Number of input nodes: 8 ## Number of hidden nodes: 4 ## ## Inputs: ## Covariate Selected Delta.BIC ## rm Yes 236.907 ## lstat Yes 168.023 ## [...] ``` --- # Boston Housing: Model Summary ```r stnn <- statnnet(nn) summary(stnn) ``` -- ``` ## [...] ## Coefficients: ## Estimate Std. Error X^2 Pr(> X^2) ## crim -0.115769 0.019085 109.8369 0.00e+00 *** ## indus -0.176500 0.018028 51.6302 1.65e-10 *** ## nox -0.163091 0.020639 39.4919 5.51e-08 *** ## rm 0.201211 0.017924 45.5051 3.12e-09 *** ## dis 0.101701 0.022437 14.6031 5.60e-03 ** ## rad -0.099667 0.019687 107.3354 0.00e+00 *** ## ptratio -0.192649 0.016672 7.8733 9.63e-02 . ## lstat -0.263402 0.014443 50.2500 3.20e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Weights: ## [...] ``` --- # Boston Housing: Plots ```r plot(stnn, conf_int = TRUE, method = "deltamethod", which = c(4, 8)) ``` -- .pull-left[ <!-- --> ] -- .pull-right[ <!-- --> ] --- class: final-slide # Summary & References ### Summary Our package `statnnet` extends existing neural network packages (currently `nnet`) to allow for a more interpretable output. ### References <font size="5">Fukumizu, K. (1996). A regularity condition of the information matrix of a multilayer perceptron network. Neural Networks, 9(5):871–879.</font> <br> <font size="5">James, G., Witten, D., Hastie, T., and Tibshirani, R. (2022). ISLR2: Introduction to Statistical Learning, Second Edition. R package version 1.3-1.</font>